Skip to content

Implement Debug for C-like enums with a concatenated string#155452

Open
makai410 wants to merge 2 commits into
rust-lang:mainfrom
makai410:enum-debug-array
Open

Implement Debug for C-like enums with a concatenated string#155452
makai410 wants to merge 2 commits into
rust-lang:mainfrom
makai410:enum-debug-array

Conversation

@makai410

@makai410 makai410 commented Apr 17, 2026

Copy link
Copy Markdown
Member

View all comments

Fixes: #114106 #133945

Related to: #88793

Continuation of: #109615 #114190

r? @ghost (I want to see the perf first)

@rustbot

rustbot commented Apr 17, 2026

Copy link
Copy Markdown
Collaborator

Changes to the code generated for builtin derived traits.

cc @nnethercote

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Apr 17, 2026
@makai410

Copy link
Copy Markdown
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Apr 17, 2026
Implement `Debug` for C-like enums with a concatenated string
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 17, 2026
@rust-bors

rust-bors Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: 5a131e3 (5a131e34e34187b36455f6f6809f9cd14e7ab55f, parent: e9e32aca5a4ffd08cbc29547b039d64b92a2c03b)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (5a131e3): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.4% [0.2%, 2.4%] 50
Regressions ❌
(secondary)
0.9% [0.2%, 1.6%] 9
Improvements ✅
(primary)
-0.4% [-0.4%, -0.4%] 1
Improvements ✅
(secondary)
-0.0% [-0.0%, -0.0%] 2
All ❌✅ (primary) 0.4% [-0.4%, 2.4%] 51

Max RSS (memory usage)

Results (primary -3.6%, secondary -0.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
2.6% [1.2%, 6.0%] 5
Improvements ✅
(primary)
-3.6% [-5.4%, -2.5%] 4
Improvements ✅
(secondary)
-4.4% [-6.3%, -1.6%] 3
All ❌✅ (primary) -3.6% [-5.4%, -2.5%] 4

Cycles

Results (primary 2.8%, secondary 2.1%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
2.8% [2.8%, 2.8%] 1
Regressions ❌
(secondary)
2.1% [2.1%, 2.1%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 2.8% [2.8%, 2.8%] 1

Binary size

Results (primary 0.4%, secondary 0.7%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.5% [0.0%, 1.4%] 26
Regressions ❌
(secondary)
0.7% [0.1%, 1.1%] 8
Improvements ✅
(primary)
-0.7% [-1.1%, -0.3%] 2
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-1.1%, 1.4%] 28

Bootstrap: 492.421s -> 491.503s (-0.19%)
Artifact size: 394.25 MiB -> 394.27 MiB (0.01%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Apr 18, 2026
Comment thread compiler/rustc_builtin_macros/src/deriving/debug.rs Outdated
@makai410

Copy link
Copy Markdown
Member Author

There was a leftover condition from an earlier PR that I pulled in without a proper review, such that it actually didn't apply the optimization to large enums <_<;

I also want to try skipping bounds checks. I suppose it could improve runtime performance, at least on my machine, based on some pretty rough benchmarks with an enum of 10,000 variants.

@makai410

Copy link
Copy Markdown
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rust-bors

This comment has been minimized.

rust-bors Bot pushed a commit that referenced this pull request Apr 18, 2026
Implement `Debug` for C-like enums with a concatenated string
@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 18, 2026
@nnethercote nnethercote self-assigned this Apr 18, 2026
@nnethercote

Copy link
Copy Markdown
Contributor

I'm happy to review this once it's ready.

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 18, 2026
Comment on lines +284 to +288
let variant_names = def
.variants
.iter()
.map(|v| v.disr_expr.is_none().then_some(v.ident.name.as_str()))
.collect::<Option<ThinVec<_>>>()?;

@makai410 makai410 Apr 18, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm, I think I missed a valid case, considering:

enum Uwu {
    QwQ = 0,
    AwA = 1,
}

which has explicit discriminants but is actually dense.

View changes since the review

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, we can use an offset when some variants have negative discriminants while the overall variants remain dense.

I'll do a follow-up PR to implement these.

@rust-bors

rust-bors Bot commented Apr 18, 2026

Copy link
Copy Markdown
Contributor

☀️ Try build successful (CI)
Build commit: 7348f26 (7348f264e27b4bbcab21ed426532865b98fc5f55, parent: 8da2d28cbd5a4e2b93e028e709afe09541671663)

@rust-timer

This comment has been minimized.

@rust-timer

Copy link
Copy Markdown
Collaborator

Finished benchmarking commit (7348f26): comparison URL.

Overall result: ❌✅ regressions and improvements - please read:

Benchmarking means the PR may be perf-sensitive. It's automatically marked not fit for rolling up. Overriding is possible but disadvised: it risks changing compiler perf.

Next, please: If you can, justify the regressions found in this try perf run in writing along with @rustbot label: +perf-regression-triaged. If not, fix the regressions and do another perf run. Neutral or positive results will clear the label automatically.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

Our most reliable metric. Used to determine the overall result above. However, even this metric can be noisy.

mean range count
Regressions ❌
(primary)
0.5% [0.2%, 7.1%] 50
Regressions ❌
(secondary)
1.2% [0.2%, 4.1%] 12
Improvements ✅
(primary)
-0.6% [-0.6%, -0.6%] 2
Improvements ✅
(secondary)
-0.2% [-0.5%, -0.0%] 13
All ❌✅ (primary) 0.5% [-0.6%, 7.1%] 52

Max RSS (memory usage)

Results (primary 0.7%, secondary 0.6%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
3.2% [1.8%, 6.9%] 5
Regressions ❌
(secondary)
2.4% [0.8%, 6.1%] 10
Improvements ✅
(primary)
-5.6% [-7.2%, -3.9%] 2
Improvements ✅
(secondary)
-5.6% [-6.3%, -4.8%] 3
All ❌✅ (primary) 0.7% [-7.2%, 6.9%] 7

Cycles

Results (primary 4.4%, secondary 3.2%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
4.4% [4.4%, 4.4%] 1
Regressions ❌
(secondary)
3.2% [1.8%, 5.1%] 3
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 4.4% [4.4%, 4.4%] 1

Binary size

Results (primary 0.4%, secondary 1.0%)

A less reliable metric. May be of interest, but not used to determine the overall result above.

mean range count
Regressions ❌
(primary)
0.6% [0.1%, 1.5%] 20
Regressions ❌
(secondary)
1.0% [0.1%, 2.1%] 20
Improvements ✅
(primary)
-0.3% [-0.5%, -0.0%] 8
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.4% [-0.5%, 1.5%] 28

Bootstrap: 493.313s -> 504.472s (2.26%)
Artifact size: 394.36 MiB -> 394.34 MiB (-0.01%)

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 18, 2026
@theemathas

Copy link
Copy Markdown
Contributor

This will probably run into #148423. Furthermore, I believe this is likely to cause unsoundness in actual programs, due to the enumflags2 crate which I think has a macro that modifies the discriminant values of enums.

@makai410

Copy link
Copy Markdown
Member Author

Given that it's a lang issue and we basically have no way to work around in this PR...

@rustbot blocked #148423

@rustbot rustbot added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Apr 19, 2026
@makai410

Copy link
Copy Markdown
Member Author

We still have compile time regression, I suppose that for small enums it might not be worth doing this kind of optimization, I don't think the runtime performance would vary a lot but the compile time has already increased a lot.

@rust-bors

rust-bors Bot commented Apr 26, 2026

Copy link
Copy Markdown
Contributor

☔ The latest upstream changes (presumably #155796) made this pull request unmergeable. Please resolve the merge conflicts.

@panstromek

panstromek commented May 28, 2026

Copy link
Copy Markdown
Contributor

@makai410 The piece of code I was looking for when we were discussing this at all hands is here: https://github.com/makai410/rust/blob/b16703bbee4b9ec6e790bbb88115c84be293635b/tests/ui/deriving/deriving-all-codegen.stdout#L445, the optimization was introduced in #98190

The derive generates array of names, and array of dyn Debug references to fields and passes that to internal non-generic debug_struct_fields_finish function. There are also specialized functions for smaller structs.

I think what you probably want to do here is very similar - generate just the arrays that are unique to each struct and pass them to some internal function to avoid duplicating a lot of similar code in multiple derives.

Also, with this approach it's maybe not even worth it to concatenate the string at all, and just use list of names and discriminant as an index, but we'd have to measure that.

@makai410

makai410 commented Jun 9, 2026

Copy link
Copy Markdown
Member Author

Also, with this approach it's maybe not even worth it to concatenate the string at all, and just use list of names and discriminant as an index, but we'd have to measure that.

Hi @panstromek, sorry for the delay. The reason for doing concatenated string is that there was a #109615 which tried a names array but it turned out to be not much of an improvement, and the idea of concatenated string was inspired by #133945 (comment), which suggested that the concatenated string could reduce the number of relocations, so I still want to try the concatenated string here.

As for the threshold number of variants, I was planning to write a benchmark tool to measure it. The idea is basically to use several well-known rust projects as compilation targets, add a custom compiler flag (-Zbike-shedding-threshold=100), sweep the threshold from 100 to 200 in steps of 20, measure the runtime improvement and compile-time overhead, and compute their ratio. A higher ratio means we trade less compile-time overhead for more runtime improvement. We can then plot the threshold on the X axis and the ratio on the Y axis, and then the threshold corresponding to the highest ratio is likely what we want, but worth noting here we still have to look at the actual number of the compile-time overhead at the highest ratio point, which might be a large number(?), to check if the number is acceptable. This is just a rough idea that can probably miss something, but well I think the overall direction is correct?

EDIT:
The problem might not be that complicated. I suspect there wouldn't be much difference between 100 and 200, since I don't think there are many enums with that many variants in this range. I suspect even 50 variants is rare in practice... So maybe I should just pick several favorite numbers and run several rounds of perf to see the results :)

@panstromek

Copy link
Copy Markdown
Contributor

I see, I didn't know about #109615, the concatenated string approach makes sense then.

So maybe I should just pick several favorite numbers and run several rounds of perf to see the results

Yea, I'd probably do something lazy like that :D. You can kinda bisect to it. But note that I'm not sure how many C-like enums with derive(Debug) we have in our benchmarks suite - especially big ones - so we might not be able to see the effect without adding a new one (or at least confirming it locally).


btw. Maybe it's also worth trying to pad all variants to the same length and indexing the resulting string by max_length * discriminant, to avoid generating second array. This way the derive would be bit more minimal (at the cost of some wasted space), something like this:

enum Fieldless0 { A, BB, CCC }
//...
const __NAMES: &str = "A  BB CCC";
debug_c_like_enum(
    __names,
   discriminant_value(self),
   3 /* max variant name len */
)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

perf-regression Performance regression. S-blocked Status: Blocked on something else such as an RFC or other implementation work. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Poor codegen for Debug impls of C-like enums with many variants

6 participants